-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[flang][OpenMP] Basic mapping of do concurrent ... reduce
to OpenMP
#146033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flang][OpenMP] Basic mapping of do concurrent ... reduce
to OpenMP
#146033
Conversation
@llvm/pr-subscribers-flang-fir-hlfir Author: Kareem Ergawy (ergawy) ChangesNow that we have changes introduced by #145837, mapping reductions from TODO: Add tests Full diff: https://github.com/llvm/llvm-project/pull/146033.diff 1 Files Affected:
diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index 709cf1d0938fa..31076f6eb328f 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -312,6 +312,19 @@ class DoConcurrentConversion
bool isComposite) const {
mlir::omp::WsloopOperands wsloopClauseOps;
+ auto cloneFIRRegionToOMP = [&rewriter](mlir::Region &firRegion,
+ mlir::Region &ompRegion) {
+ if (!firRegion.empty()) {
+ rewriter.cloneRegionBefore(firRegion, ompRegion, ompRegion.begin());
+ auto firYield =
+ mlir::cast<fir::YieldOp>(ompRegion.back().getTerminator());
+ rewriter.setInsertionPoint(firYield);
+ rewriter.create<mlir::omp::YieldOp>(firYield.getLoc(),
+ firYield.getOperands());
+ rewriter.eraseOp(firYield);
+ }
+ };
+
// For `local` (and `local_init`) opernads, emit corresponding `private`
// clauses and attach these clauses to the workshare loop.
if (!loop.getLocalVars().empty())
@@ -326,50 +339,65 @@ class DoConcurrentConversion
TODO(localizer.getLoc(),
"local_init conversion is not supported yet");
- auto oldIP = rewriter.saveInsertionPoint();
+ mlir::OpBuilder::InsertionGuard guard(rewriter);
rewriter.setInsertionPointAfter(localizer);
+
auto privatizer = rewriter.create<mlir::omp::PrivateClauseOp>(
localizer.getLoc(), sym.getLeafReference().str() + ".omp",
localizer.getTypeAttr().getValue(),
mlir::omp::DataSharingClauseType::Private);
- if (!localizer.getInitRegion().empty()) {
- rewriter.cloneRegionBefore(localizer.getInitRegion(),
- privatizer.getInitRegion(),
- privatizer.getInitRegion().begin());
- auto firYield = mlir::cast<fir::YieldOp>(
- privatizer.getInitRegion().back().getTerminator());
- rewriter.setInsertionPoint(firYield);
- rewriter.create<mlir::omp::YieldOp>(firYield.getLoc(),
- firYield.getOperands());
- rewriter.eraseOp(firYield);
- }
-
- if (!localizer.getDeallocRegion().empty()) {
- rewriter.cloneRegionBefore(localizer.getDeallocRegion(),
- privatizer.getDeallocRegion(),
- privatizer.getDeallocRegion().begin());
- auto firYield = mlir::cast<fir::YieldOp>(
- privatizer.getDeallocRegion().back().getTerminator());
- rewriter.setInsertionPoint(firYield);
- rewriter.create<mlir::omp::YieldOp>(firYield.getLoc(),
- firYield.getOperands());
- rewriter.eraseOp(firYield);
- }
-
- rewriter.restoreInsertionPoint(oldIP);
+ cloneFIRRegionToOMP(localizer.getInitRegion(),
+ privatizer.getInitRegion());
+ cloneFIRRegionToOMP(localizer.getDeallocRegion(),
+ privatizer.getDeallocRegion());
wsloopClauseOps.privateVars.push_back(op);
wsloopClauseOps.privateSyms.push_back(
mlir::SymbolRefAttr::get(privatizer));
}
+ if (!loop.getReduceVars().empty()) {
+ for (auto [op, byRef, sym, arg] : llvm::zip_equal(
+ loop.getReduceVars(), loop.getReduceByrefAttr().asArrayRef(),
+ loop.getReduceSymsAttr().getAsRange<mlir::SymbolRefAttr>(),
+ loop.getRegionReduceArgs())) {
+ auto firReducer =
+ mlir::SymbolTable::lookupNearestSymbolFrom<fir::DeclareReductionOp>(
+ loop, sym);
+
+ mlir::OpBuilder::InsertionGuard guard(rewriter);
+ rewriter.setInsertionPointAfter(firReducer);
+
+ auto ompReducer = rewriter.create<mlir::omp::DeclareReductionOp>(
+ firReducer.getLoc(), sym.getLeafReference().str() + ".omp",
+ firReducer.getTypeAttr().getValue());
+
+ cloneFIRRegionToOMP(firReducer.getAllocRegion(),
+ ompReducer.getAllocRegion());
+ cloneFIRRegionToOMP(firReducer.getInitializerRegion(),
+ ompReducer.getInitializerRegion());
+ cloneFIRRegionToOMP(firReducer.getReductionRegion(),
+ ompReducer.getReductionRegion());
+ cloneFIRRegionToOMP(firReducer.getAtomicReductionRegion(),
+ ompReducer.getAtomicReductionRegion());
+ cloneFIRRegionToOMP(firReducer.getCleanupRegion(),
+ ompReducer.getCleanupRegion());
+
+ wsloopClauseOps.reductionVars.push_back(op);
+ wsloopClauseOps.reductionByref.push_back(byRef);
+ wsloopClauseOps.reductionSyms.push_back(
+ mlir::SymbolRefAttr::get(ompReducer));
+ }
+ }
+
auto wsloopOp =
rewriter.create<mlir::omp::WsloopOp>(loop.getLoc(), wsloopClauseOps);
wsloopOp.setComposite(isComposite);
Fortran::common::openmp::EntryBlockArgs wsloopArgs;
wsloopArgs.priv.vars = wsloopClauseOps.privateVars;
+ wsloopArgs.reduction.vars = wsloopClauseOps.reductionVars;
Fortran::common::openmp::genEntryBlock(rewriter, wsloopArgs,
wsloopOp.getRegion());
@@ -393,7 +421,8 @@ class DoConcurrentConversion
clauseOps.loopLowerBounds.size())))
rewriter.replaceAllUsesWith(loopNestArg, wsloopArg);
- for (unsigned i = 0; i < loop.getLocalVars().size(); ++i)
+ for (unsigned i = 0;
+ i < loop.getLocalVars().size() + loop.getReduceVars().size(); ++i)
loopNestOp.getRegion().eraseArgument(clauseOps.loopLowerBounds.size());
return loopNestOp;
|
b99122b
to
c99a502
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but please could you add a test that covers every region.
5f665c9
to
105ac7a
Compare
c99a502
to
34a5df1
Compare
105ac7a
to
3e2bdf5
Compare
34a5df1
to
185b43f
Compare
Thanks for the review. Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
3e2bdf5
to
15ed831
Compare
185b43f
to
8c25fe7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
15ed831
to
53f9c0d
Compare
8c25fe7
to
39e6ed7
Compare
…lled in OpenMP and OpenACC This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - llvm#145837 (this one) - llvm#146025 - llvm#146028 - llvm#146033
With llvm#145837, the `ReductionProcessor` component is now used by both OpenMP and `do concurrent`. Therefore, this PR moves it to a shared location: `flang/Lower/Support`. PR stack: - llvm#145837 - llvm#146025 (this one) - llvm#146028 - llvm#146033
Re-organizes the op definition a little bit and removes a method that does not add much value to the API. PR stack: - llvm#145837 - llvm#146025 - llvm#146028 (this one) - llvm#146033
Now that we have changes introduced by llvm#145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping. PR stack: - llvm#145837 - llvm#146025 - llvm#146028 - llvm#146033 (this one)
…lled in OpenMP and OpenACC (#145837) This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - #145837 (this one) - #146025 - #146028 - #146033
53f9c0d
to
8e67de7
Compare
…ns are modelled in OpenMP and OpenACC (#145837) This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - llvm/llvm-project#145837 (this one) - llvm/llvm-project#146025 - llvm/llvm-project#146028 - llvm/llvm-project#146033
8e67de7
to
df63fea
Compare
… (#146025) With #145837, the `ReductionProcessor` component is now used by both OpenMP and `do concurrent`. Therefore, this PR moves it to a shared location: `flang/Lower/Support`. PR stack: - llvm/llvm-project#145837 - llvm/llvm-project#146025 (this one) - llvm/llvm-project#146028 - llvm/llvm-project#146033
Now that we have changes introduced by #145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping.
39e6ed7
to
c1834bb
Compare
…defintion (#146028) Re-organizes the op definition a little bit and removes a method that does not add much value to the API. PR stack: - llvm/llvm-project#145837 - llvm/llvm-project#146025 - llvm/llvm-project#146028 (this one) - llvm/llvm-project#146033
…` to OpenMP (#146033) Now that we have changes introduced by #145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping. PR stack: - llvm/llvm-project#145837 - llvm/llvm-project#146025 - llvm/llvm-project#146028 - llvm/llvm-project#146033 (this one)
Now that we have changes introduced by #145837, mapping reductions from
do concurrent
to OpenMP is almost trivial. This PR adds such mapping.PR stack:
reduce
to match reductions are modelled in OpenMP and OpenACC #145837ReductionProcessor
toLower/Support
. #146025fir_DoConcurrentLoopOp
's defintion #146028do concurrent ... reduce
to OpenMP #146033 (this one)